A Chinese text classification model based on radicals and character distinctions

نویسندگان

چکیده

Chinese characters are generally correlated with their semantic meanings, and the structure of radicals, in particular, can be a clear indication how related to each other. In simplification movement, some different traditional have been transferred into one simplified character (many-to-one mapping), resulting phenomenon ’one corresponding many characters. Compared characters, contain richer structural information, which is also more meaningful understanding. Traditional approaches text modelling often overlook content role human cognitive behaviour process comprehension. Hence, we propose classification model derived from construction methods evolution The consists two branches: traditional, an attention module based on radical branch. Specifically, first develop sequential obtain sequence information texts. Afterwards, associated word using part head as medium designed filter out keywords high differentiation among auxiliary units. An then implemented balance importance keyword particular context. Our proposed method conducted three datasets demonstrate validity plausibility.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Text Content Filtering Based on Chinese Character Reconstruction from Radicals

Content filtering through keyword matching is widely adopted in network censoring, and proven to be successful. However, a technique to bypass this kind of censorship by decomposing Chinese characters appears recently. Chinese characters are combinations of radicals, and splitting characters into radicals pose a big obstacle to keyword filtering. To tackle this challenge, we proposed the first ...

متن کامل

Feature Selection on Chinese Text Classification Using Character N-Grams

In this paper, we perform Chinese text classification using n-gram text representation on TanCorp which is a new large corpus special for Chinese text classification more than 14,000 texts divided into 12 classes. We use different n-gram feature (1-, 2-grams or 1-, 2-, 3-grams) to represent documents. Different feature weights (absolute text frequency, relative text frequency, absolute n-gram f...

متن کامل

Chinese Character Classification Based on Rough Set and SVM Algorithm1

In the paper, we present a integrated approach combined Rough Set theory and SVM algorithm. The approach udl be divided into two steps. The fust step is classified roughlv with Rough Set, rule should be induced in this step by infonilation system. The second step should ht: classified precisely based on SVM Algorithn~, in this step we present two new fiuidrunental principles to help us select b...

متن کامل

mortality forecasting based on lee-carter model

over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...

15 صفحه اول

A Character-Net Based Chinese Text Segmentation Method

The segmentation of Chinese texts is a key process in Chinese information processing. The difficulties in segmentation are the process of ambiguous character string and unknown Chinese words. In order to obtain the correct result, the first is identification of all possible candidates of Chinese words in a text. In this paper, a data structure Chinese-character-net is put forward, then, based o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3257339